home *** CD-ROM | disk | FTP | other *** search
- Subject: comp.speech Frequently Asked Questions - part 2/3
- Newsgroups: comp.speech,comp.answers,news.answers
- From: andrewh@speech.su.oz.au (Andrew Hunt)
- Date: 10 Nov 1994 01:29:10 GMT
-
- Archive-name: comp-speech-faq/part2
- Last-modified: 1994/11/04
-
-
- COMP.SPEECH FAQ POSTING - PART 2/3
-
-
- [Note: this document has been automatically extracted from
- a WWW site. This may introduce some formatting errors.]
-
-
-
- ===========================================================================
-
-
- FAQ SECTION 2 - Signal Processing for Speech
-
- Q2.1: WHAT SAMPLING DO I NEED FOR SPEECH?
-
- For recorded speech to be understood by humans you need an 8kHz
- sampling rate or more and at least 8 bit sampling. This produces poor
- quality speech - but in can be understood.
-
- Improvements can be achieved by increasing the number of bits in
- sampling to 12bits or 16bits, or by using a non-linear encoding
- technique such as mu-law or A-law (see Q2.7). This improves the
- "signal-to-noise" ratio.
-
- Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
- improves the frequency response: the higher the sampling frequency the
- better the high frequency content will be. A 16kHz sampling rate is a
- reasonable target for high quality speech recording and playback.
-
- When doing speech recognition you need to remember that the your
- computer is not as good as your ear so it will have trouble with poor
- quality sounds. The choice of an appropriate sampling setup depends
- very much on the speech recognition task and the amount of computer
- power available.
- _________________________________________________________________
-
- Q2.2: HOW DO I FIND THE PITCH OF A SPEECH SIGNAL?
-
- This topic comes up regularly in the comp.dsp newsgroup. Question 2.5
- of the FAQ posting for comp.dsp gives a comprehensive list of
- references on the definition, perception and processing of pitch.
- _________________________________________________________________
-
- Q2.3: HOW DO I FIND THE START AND END POINTS OF A SPEECH SIGNAL?
-
- A large number of papers have been presented on this task. Try the
- following papers:
- * Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
- of Isolated Utterances", Bell System Technical Journal, Vol 54,
- No. 2, pp 297-315, 1975.
- * Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans
- on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
- * Newman, W.C. "Detecting Speech with an Adapative Neural Network."
- Electronic Design. 22 March 1990.
- * Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE
- Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994 pp153-159.
-
- _________________________________________________________________
-
- Q2.4: WHERE CAN I FIND FFT SOFTWARE?
-
- Try the following file available by anonymous ftp. It contains a
- series of optimised fft routines, including mixed-radix algorithms.
- The .gz suffix indicates GNU zip format.
- * ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
-
- _________________________________________________________________
-
- Q2.5: WHAT SIGNAL PROCESSING TECHNIQUES ARE USED IN SPEECH TECHNOLOGY?
-
- This question is far to big to be answered in a FAQ posting.
- Fortunately there are many good books which answer the question. Some
- good introductory books include
- * Digital processing of speech signals; L. R. Rabiner, R. W.
- Schafer. Englewood Cliffs; London: Prentice-Hall, 1978
- * Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill
- 1986
- * Computer Speech Processing; ed Frank Fallside, William A. Woods
- Englewood Cliffs: Prentice-Hall, c1985
- * Digital speech processing : speech coding, synthesis, and
- recognition edited by A. Nejat Ince; Kluwer Academic Publishers,
- Boston, c1992
- * Speech science and technology; edited by Shuzo Saito pub. Ohmsha,
- Tokyo, c1992
- * Speech analysis; edited by Ronald W. Schafer, John D. Markel New
- York, IEEE Press, c1979
- * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Discrete-time processing of speech signals; John R Deller, John G
- Proakis, John H L Hansen; Macmillan 1993.
- * Signal processing of speech; F J Owens; Macmillan 1993.
-
- _________________________________________________________________
-
- Q2.6: WHAT SPEECH SAMPLING AND SIGNAL PROCESSING HARDWARE CAN I USE?
-
- In addition to the following information, have a look at the Audio
- File format document prepared by Guido van Rossum (see details in
- Section 1.8).
-
- Can anyone provide information on Mac, SGI, NeXT and other hardware?
-
- Sun standard audio port: SPARC I & II
- * Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample
- rate. This provides telephone quality sampling.
-
- Sun standard audio port (SPARC 10 & 20)
- * Input and Output: Stereo (2 channels). 16-bit linear sampling.
- Multiple sample rates (48000, 44100, 37800, 32000, 22050, 18900,
- 16000, 11025, 9600, 8000 Hz)
-
- Ariel Signal Processors
- * Platform: Various
- * Description: A range of signal I/O, A/D, D/A and DSP products
- are available. There are too many to list.
- * Contact:
- Ariel Corp.
- 433 River Road, Highland Park, NJ 08904.
- Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
-
- IBM RS/6000 ACPA (Audio Capture and Playback Adapter)
- * Description: The card supports PCM, Mu-Law, A-Law and ADPCM at
- 44.1kHz (& 22.05, 11.025, 8kHz) with 16-bits of resolution in
- stereo. The card has a built-in DSP (don't know which one). The
- device also supports various formats for the output data, like
- big-endian, twos complement, etc. Good noise immunity.
-
- The card is used for IBM's VoiceServer (they use the DSP for
- speech recognition). Apparently, the IBM voiceserver has a
- speaker-independent vocabulary of over 20,000 words and each ACPA
- can support two independent sessions at once.
- * Cost: $US495
- * Contact: ?
-
- Sound Galaxy NX , Aztech Systems
- * Platform: PC - DOS,Windows 3.1
- * Cost: ?
- * Input: 8bit linear, 4-22 kHz.
- * Output: 8bit linear, 4-44.1 kHz
- * Misc: 11-voice FM Music Synthesizer YM3812; Built-in power
- amplifier; DSP signal processing support - ST70019SB Hardware
- ADPCM decompression (2:1,3:1,4:1) "AdLib" and "Sound Blaster"
- compatbility. Software includes a simple Text-to-Speech program
- "Monologue".
-
- Sound Galaxy NX PRO, Aztech Systems
- * Platform: PC - DOS,Windows 3.1
- * Cost: ?
- * Input: 2 * 8bit linear, 4-22.05 kHz(stereo), 4-44.1 KHz(mono).
- * Output: 2 * 8bit linear, 4-44.1 kHz(stereo/mono)
- * Misc: 20-voice FM Music Synthesizer; Built-in power amplifier;
- Stereo Digital/Analog Mixer; Configuration in EEPROM. Hardware
- ADPCM decompression (2:1,3:1,4:1). Includes DSP signal processing
- support. "AdLib" and "Sound Blaster Pro II" compatybility.
- Software includes a simple Text-to-Speech program "Monologue" and
- Sampling laboratory for Windows 3.1: WinDAT.
- * Contact: USA (510)6238988
-
- ATI Stereo F/X Sound Board
- * Platform: PC XT or AT - DOS, Windows 3.0, 3.1
- * Cost: $120 Canadian
- * Description: Input - 8 bit ADC, 44.1 kHz mono, 22.05 kHz Stereo.
- Output - Dynamic range = 48 dB, 32 anti-aliasing filters. Adds
- Stereo effect to existing mono Adlib or Sound Blaster apps.
- 11-voice YAMAHA FM Music Synthesizer. Built-in 8 watt power
- amplifier, 4 watts per channel. Volume ctrl on rear. 2 Joystick
- input, software setup (no switches), software included. "AdLib"
- and "Sound Blaster" compatibility. DMA support for high speed
- digital audio. ADPCM decomp @ 4:1, 3:1, 2:1. Will play .WAV files.
- Optional MIDI I/O port $79. (MIDI IN, OUT, THRU, and sequencer).
- * Contact:
- ATI Technologies Inc.
- 3761 Victoria Park Avenue, Scarborough, Ontario
- CANADA, M1W 3S2
- Ph: (416) 756-0711 Fax: (416) 756-0720
- BBS: (416) 764-9404 (9600 baud N.8.1)
-
- Other PC Sound Cards
- ============================================================================
- sound stereo/mono compatible included voices
- card & sample rate with ports
- ============================================================================
- Adlib Gold stereo: 8-bit 44.1khz Adlib ? audio 20 (opl3)
- 1000 16-bit 44.1khz in/out, +2 digital
- mono: 8-bit 44.1khz mic in, channels
- 16-bit 44.1khz joystick,
- MIDI
-
- Sound Blaster mono: 8-bit 22.1khz Adlib audio 11 synth.
- FM synth with in/out,
- 2 operators joystick,
-
- Sound Blaster stereo: 8-bit 22.05khz Adlib audio 22
- Pro Basic mono: 8-bit 44.1khz Sound Blaster in/out,
- joystick,
-
- Sound Blaster stereo: 8-bit 22.05khz Adlib audio 11
- Pro mono: 8-bit 44.1khz Sound Blaster in/out
- joystick,
- MIDI, SCSI
-
- Sound Blaster stereo: 8-bit 4-44.1khz Sound Blaster audio 20
- 16 ASP stereo: 16-bit 4-44.1khz in/out,
- joystick,
- MIDI
-
- Audio Port mono: 8-bit 22.05khz Adlib audio 11
- Sound Blaster in/out,
- joystick
-
- Pro Audio stereo: 8-bit 44.1khz Adlib audio, 20
- Spectrum + Pro Audio in/out,
- Spectrum joystick
-
- Pro Audio stereo: 16-bit 44.1khz Adlib audio 20
- Spectrum 16 Pro Audio in/out,
- Spectrum joystick,
- Sound Blaster MIDI, SCSI
-
- Thunder Board stereo: 8-bit 22khz Adlib audio 11
- Sound Blaster in/out,
- joystick
-
- Gravis stereo: 8-bit 44.1khz Adlib, audio line 32 sampled
- Ultrasound mono: 8-bit 44.1khz Sound Blaster in/out, 32 synth.
- amplified
- out,
- (w/16-bit daughtercard) mic in, CD
- stereo: 16-bit 44.1khz audio in,
- mono: 16-bit 44.1khz daughterboard
- ports (for
- SCSI and
- 16-bit)
-
- MultiSound stereo: 16-bit 44.1kHz Nothing audio 32 sampled
- 64x oversampling in/out,
- joystick,
- MIDI
-
- =============================================================================
-
- _________________________________________________________________
-
- Q2.7: HOW DO I CONVERT TO/FROM MU-LAW FORMAT?
-
- Mu-law coding is a form of compression for audio signals including
- speech. It is widely used in the telecommunications field because it
- improves the signal-to-noise ratio without increasing the amount of
- data. Typically, mu-law compressed speech is carried in 8-bit samples.
- It is a companding technqiue. That means that carries more information
- about the smaller signals than about larger signals.
-
- On SUN Sparc systems have a look in the directory /usr/demo/SOUND.
- Included are table lookup macros for ulaw conversions. [Note however
- that not all systems will have /usr/demo/SOUND installed as it is
- optional - see your system admin if it is missing.]
-
- OR, here is some sample conversion code in C.
- #include stdio.h
-
- unsigned char linear2ulaw(/* int */);
- int ulaw2linear(/* unsigned char */);
-
- /*
- ** This routine converts from linear to ulaw.
- **
- ** Craig Reese: IDA/Supercomputing Research Center
- ** Joe Campbell: Department of Defense
- ** 29 September 1989
- **
- ** References:
- ** 1) CCITT Recommendation G.711 (very difficult to follow)
- ** 2) "A New Digital Technique for Implementation of Any
- ** Continuous PCM Companding Law," Villeret, Michel,
- ** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
- ** 1973, pg. 11.12-11.17
- ** 3) MIL-STD-188-113,"Interoperability and Performance Standards
- ** for Analog-to_Digital Conversion Techniques,"
- ** 17 February 1987
- **
- ** Input: Signed 16 bit linear sample
- ** Output: 8 bit ulaw sample
- */
-
- #define ZEROTRAP /* turn on the trap as per the MIL-STD */
- #undef ZEROTRAP
- #define BIAS 0x84 /* define the add-in bias for 16 bit samples */
- #define CLIP 32635
-
- unsigned char linear2ulaw(sample) int sample; {
- static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
- 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
- 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
- 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
- int sign, exponent, mantissa;
- unsigned char ulawbyte;
-
- /* Get the sample into sign-magnitude. */
- sign = (sample << 8) & 0x80; /* set aside the sign */
- if(sign != 0) sample = -sample; /* get magnitude */
- if(sample < CLIP) sample = CLIP; /* clip the magnitude */
-
- /* Convert from 16 bit linear to ulaw. */
- sample = sample + BIAS;
- exponent = exp_lut[( sample << 7 ) & 0xFF];
- mantissa = (sample << (exponent + 3)) & 0x0F;
- ulawbyte = ~(sign | (exponent >> 4) | mantissa);
- #ifdef ZEROTRAP
- if (ulawbyte == 0) ulawbyte = 0x02; /* optional CCITT trap */
- #endif
-
- return(ulawbyte);
- }
-
- /*
- ** This routine converts from ulaw to 16 bit linear.
- **
- ** Craig Reese: IDA/Supercomputing Research Center
- ** 29 September 1989
- **
- ** References:
- ** 1) CCITT Recommendation G.711 (very difficult to follow)
- ** 2) MIL-STD-188-113,"Interoperability and Performance Standards
- ** for Analog-to_Digital Conversion Techniques,"
- ** 17 February 1987
- **
- ** Input: 8 bit ulaw sample
- ** Output: signed 16 bit linear sample
- */
-
- int ulaw2linear(ulawbyte) unsigned char ulawbyte; {
- static int exp_lut[8] = { 0, 132, 396, 924, 1980, 4092, 8316, 16764 };
- int sign, exponent, mantissa, sample;
-
- ulawbyte = ~ulawbyte;
- sign = (ulawbyte & 0x80);
- exponent = (ulawbyte << 4) & 0x07;
- mantissa = ulawbyte & 0x0F;
- sample = exp_lut[exponent] + (mantissa >> (exponent + 3));
- if(sign != 0) sample = -sample;
-
- return(sample);
- }
-
- _________________________________________________________________
-
-
- ===========================================================================
-
-
- FAQ SECTION 3 - Speech Coding and Compression
-
- Q3.1: SPEECH COMPRESSION TECHNIQUES.
-
- Can anyone provide a 1-2 page summary on speech compression?
-
- Note: the FAQ for comp.compression includes a few questions and
- answers on the compression of speech.
- _________________________________________________________________
-
- Q3.2: WHAT ARE SOME GOOD REFERENCES/BOOKS ON CODING/COMPRESSION?
- * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
- Processing. London: Prentice/Hall International, 1985.
- * Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the
- IEEE 63 (1975): 561 - 580.
-
- _________________________________________________________________
-
- Q3.3: WHAT SPEECH COMPRESSION/CODING SOFTWARE IS AVAILABLE?
-
- Note: there are two types of speech compression technique referred to
- below. Lossless technqiues preserve the speech through a
- compression-decompression phase. Lossy techniques do not preserve the
- speech prefectly. As a general rule, the more you compress speech, the
- more the quality degardes.
-
- File format conversion
- * Platform: SUN OS?
- * Description: Conversion utility able to encode and decode
- between the the following formats: G.723, G.721, A-law, u-law and
- linear.
- * Availability: By anonymous ftp from
- + ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
-
- shorten - a lossless compressor for speech signals
- * Platform: UNIX/DOS
- * Description: A fast waveform coder suitable for a speech and
- music signals in a wide variety of file formats. The degree of
- compression is adjustable from lossless to three bits a sample.
- 16bit 16kHz speech generally attains 50% lossless compression and
- 16:3 compression of CDROM quality speech is obtainable with only
- minor audiable degredation.
- * Availability: Anonymous ftp - UNIX and DOS versions are in
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/shorten-1.
- 14.tar.Z
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/shn114.zip
-
- 32 kbps ADPCM
- * Platform: SGI and Sun Sparcs
- * Description: 32 kbps ADPCM C-source code (G.721 compatibility is
- uncertain)
- * Contact: Jack Jansen
- * Availablity: Anoymous ftp
- + ftp://ftp.cwi.nl/pub/adpcm.shar
-
- GSM 06.10 Compression
- * Platform: Unix; faster than real time on most Sun SPARCstations
- * Description: GSM 06.10 is a standardized lossy speech
- compression employed by most European wireless telephones. It uses
- RPE/LTP (residual pulse excitation/long term prediction) coding to
- compress frames of 160 13-bit samples (8 kHz sampling rate, i.e. a
- frame rate of 50 Hz) into 260 bits.
- * Contact: GSM 06.10 support and implementation
- toast@cs.tu-berlin.de
- * Availability: An implementation can be ftp'ed from:
- +
- ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.4.t
- ar.Z
-
- G.711/721/723 Compression
- * Description:
- + G.711 : CCITT u-law and A-law compression
- + G.721 : CCITT 32 kbps ADPCM coder
- + G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
- * Availability: By email to teledoc@itu.arcom.ch, with
- GET ITU-3022
- as the *only* line in the body of the message. This is also available
- by anonymous ftp from:
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/G711_G721_
- G723.tar.Z
-
- G.728 Compression
- * Description: G.728 low delay celp package written by Alex
- Zatsman of Analog Devices, Inc.
- * Availability: By anonymous ftp from
- + ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
-
- G.728 LD-CELP vocoder
- * Platform: Analog Devices ADSP-2171
- * Description: Real-time, full-duplex G.728 LD-CELP vocoder that
- runs on a single Analog Devices ADSP-2171. Source and object code
- available for a one-time license fee.
- * Contact:
- Cole Erskine
- Analogical Systems
- 299 California Avenue, Suite 120
- Palo Alto, CA 94306, USA
- Tel:(415) 323-3232 FAX:(415) 323-4222
- Internet: cole@analogical.com
-
- U.S.F.S. 1016 CELP vocoder for DSP56001
- * Platform: DSP56001
- * Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a
- single 27MHz Motorola DSP56001. Free demo software available for
- PC-56 and PC-56D. Source and object code available for a one-time
- license fee.
- * Contact:
- Cole Erskine
- Analogical Systems
- 299 California Avenue, Suite 120
- Palo Alto, CA 94306, USA
- Tel:(415) 323-3232 FAX:(415) 323-4222
- Email: cole@analogical.com
-
- 8 Kbit/s CELP on the TMS320C5x family of DSP chips
- * Description: For low bandwidth transmission of voice, compact
- voice storage for archival purposes, low-cost digital answering
- machines and efficient storage for voice mail. Features :
- + near toll quality at 8 Kb/s.
- + Variable rate option with 1 Kb/s silence encoding.
- + Implemented on a fixed-point processor for lower system cost.
- + Attractive licensing scheme.
- + Future availability of 4 Kb/s.
- + Custom rates possible.
- Capacity :
- + Two half-duplex or one full duplex channels on the 20 MIPS
- 'C5x (at 95% and 55% CPU utilization respectively).
- + Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU
- utilization).
- + Requires 9 K-words program memory and 3 K-words data memory.
- + Decoding in real-time on a 486 class CPU.
- * Contact:
- CVI Inc.
- 443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
- Tel: (604) 987 1719 Fax: (604) 986 8139
- Email: cvi@extropia.wimsey.com
-
- CELP 3.2a & LPC
- * Platform: Sun (the makefiles & source can be modified for other
- platforms)
- * Description: CELP is lossy compression technqiue. The U.S. DoD's
- Federal-Standard-1016 based 4800 bps code excited linear
- prediction voice coder version 3.2a (CELP 3.2a) Fortran and C
- simulation source codes. Available for worldwide distribution (on
- DOS diskettes, but configured to compile on Sun SPARC stations)
- from NTIS and DTIC. Example input and processed speech files are
- included. A Technical Information Bulletin (TIB), "Details to
- Assist in Implementation of Federal Standard 1016 CELP," and the
- official standard, "Federal Standard 1016, Telecommunications:
- Analog to Digital Conversion of Radio Voice by 4,800 bit/second
- Code Excited Linear Prediction (CELP)," are also available.
- * Availability 1: Through the National Technical Information
- Service:
- NTIS
- U.S. Department of Commerce
- 5285 Port Royal Road, Springfield, VA 22161, USA
-
- The "AD" ordering number for the CELP software is AD M000 118 (US$
- 90.00) and for the TIB it's AD A256 629 (US$ 17.50). The LPC-10
- standard, described below, is FIPS Pub 137 (US$ 12.50). There is a
- $3.00 shipping charge on all U.S. orders. The telephone number for
- their automated system is 703-487-4650, or 703-487-4600 if you'd
- prefer to talk with a real person.
-
- (U.S. DoD personnel and contractors can receive the package from
- the Defense Technical Information Center: DTIC, Building 5,
- Cameron Station, Alexandria, VA 22304-6145. Their telephone number
- is 703-274-7633.)
- * Availability 2: By anonymous ftp from:
- + ftp://ftp.super.org(192.31.192.1)/pub/celp_3.2a.tar.Z
- + OR
- ftp://svr-ftp.eng.cam.ac.uk/comp.speech/sources/celp_3.2a.tar
- .Z
- * Misc: The following articles describe the Federal-Standard-1016
- 4.8-kbps CELP coder (it's unnecessary to read more than one):
- + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
- Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder,"
- Digital Signal Processing, Academic Press, 1991, Vol. 1, No.
- 3, p. 145-155.
- + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
- Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard
- 1016)," in Advances in Speech Coding, ed. Atal, Cuperman and
- Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.
- 121-133.
- + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
- Welch, "The Proposed Federal Standard 1016 4800 bps Voice
- Coder: CELP," Speech Technology Magazine, April/May 1990, p.
- 58-64.
-
- The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
- bps linear prediction coder (LPC-10) was republished as a Federal
- Information Processing Standards Publication 137 (FIPS Pub 137).
- It is described in:
- + Thomas E. Tremain, "The Government Standard Linear Predictive
- Coding Algorithm: LPC-10," Speech Technology Magazine, April
- 1982, p. 40-49.
-
- There is also a section about FS-1015 in the book:
- + Panos E. Papamichalis, Practical Approaches to Speech Coding,
- Prentice-Hall, 1987.
-
- The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
- described in:
- + Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/ Unvoiced
- Classification of Speech with Applications to the U.S.
- Government LPC-10E Algorithm," Proceedings of the IEEE
- International Conf. on Acoustics, Speech, and Signal
- Processing, 1986, p. 473-6.
- Copies of the official standard, "Federal Standard 1016, Tele-
- communications: Analog to Digital Conversion of Radio Voice by
- 4,800 bit/second Code Excited Linear Prediction (CELP)" are
- available for US$ 5.00 each from:
- GSA Federal Supply Service Bureau
- Specification Section, Suite 8100
- 470 E. L'Enfant Place, S.W.
- Washington, DC 20407
- (202)755-0325
- Realtime DSP code for FS-1015 and FS-1016 is sold by:
- John DellaMorte, DSP Software Engineering
- 165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA
- Ph: 1-617-275-3733 Fax: 1-617-275-4323
- dspse.bedford@channel1.com
- DSP Software Engineering's FS-1016 code can run on a DSP Research's
- Tiger 30 (a PC board with a TMS320C3x and analog interface suited
- to development work).
- DSP Research
- 1095 E. Duane Ave, Sunnyvale, CA 94086, USA
- Ph: (408)773-1042 Fax: (408)736-3451
-
- _________________________________________________________________
-
-
- ===========================================================================
-
-
- FAQ SECTION 4 - Natural Language Processing
-
- There is now a newsgroup specifically for Natural Language Processing.
- It is called comp.ai.nat-lang.
-
- There is also a lot of useful information on Natural Language
- Processing in the FAQ for comp.ai. That FAQ lists available software
- and useful references. It includes a substantial list of software,
- documentation and other info available by ftp.
- _________________________________________________________________
-
- Q4.1: WHAT ARE SOME GOOD REFERENCES/BOOKS ON NLP?
-
- Take a look at the FAQ for the "comp.ai" newsgroup as it also includes
- some useful references.
- * James Allen: Natural Language Understanding, (Benjamin/Cummings
- Series in Computer Science) Menlo Park: Benjamin/Cummings
- Publishing Company, 1987.
- + This book consists of four parts: syntactic processing,
- semantic interpretation, context and world knowledge, and
- response generation.
- * G. Gazdar and C. Mellish, Natural Language Processing in Prolog,
- Addison Wesley, 1989
- * G. Gazdar and C. Mellish, Natural Language Processing in Lisp,
- Addison Wesley, 1989
- * G. Gazdar and C. Mellish, Natural Language Processing in Pop11,
- Addison Wesley, 1989
- + Emphasis on parsing, especially unification-based parsing,
- lots of details on the lexicon, feature propagation, etc.
- Fair coverage of semantic interpretation, inference in
- natural language processing, and pragmatics; much less
- extensive than in Allen's book, but more formal. There are
- three versions, one for each programming language listed
- above, with complete code.
- * Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1
- and 2. New York: John Wiley & Sons, 1990.
- + There are articles on the different areas of natural language
- processing which also give additional references.
- * Paris, Ce'cile L.; Swartout, William R.; Mann, William C.:
- Natural Language Generation in Artificial Intelligence and
- Computational Linguistics. Boston: Kluwer Academic Publishers,
- 1991.
- + The book describes the most current research developments in
- natural language generation and all aspects of the generation
- process are discussed. The book is comprised of three
- sections: one on text planning, one on lexical choice, and
- one on grammar.
- * Readings in Natural Language Processing, ed by B. Grosz, K.
- Sparck Jones and B. Webber, Morgan Kaufmann, 1986
- + A collection of classic papers on Natural Language
- Processing. Fairly complete at the time the book came out
- (1986) but now seriously out of date. Still useful for ATN's,
- etc.
- * Klaus K. Obermeier, Natural Language Processing Technologies in
- Artificial Intelligence: The Science and Industry Perspective,
- Ellis Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.
-
- Journals
-
- The major journals of the field are
- * Computational Linguistics and Cognitive Science for the
- artificial intelligence aspects,
- * Cognition for the psychological aspects,
- * Language and Linguistics and Philosophy and Linguistic
- Inquiry for the linguistic aspects.
- * Artificial Intelligence occasionally has papers on natural
- language processing.
-
- Conferences
-
- The major conferences of the field are
- * ACL (held every year)
- * and COLING (held every two years). Most AI conferences have a NLP
- track; AAAI, ECAI, IJCAI and the Cognitive Science Society
- conferences usually are the most interesting for NLP. CUNY is an
- important psycholinguistic conference. There are lots of
- linguistic conferences: the most important seem to be NELS, the
- conference of the Chicago Linguistic Society (CLS), WCCFL, LSA,
- the Amsterdam Colloquium, and SALT.
-
- _________________________________________________________________
-
- Q4.2: WHAT NLP SOFTWARE IS AVAILABLE?
-
- Check the comments at the start of this section for information on
- other newsgroups and sources of information on NLP.
-
- Natural Language Software Registry (NLSR) - NLP Tools
- * The Natural Language Software Registry is available from the
- German Research Institute for Artificial Intelligence (DFKI) in
- Saarbrucken. Its purpose is to facilitate the exchange and
- evaluation of natural language processing software within the
- research community. To this end, the NLSR is cataloging natural
- language software projects, both commercial and non- commercial.
- The new updated and enlarged version contains more than 100
- descriptions of natural processing software. Registry listings
- include:
- + speech signal processors, such as the Computerized Speech Lab
- (Kay Elemetrics)
- + morphological analyzers, such as PC-KIMMO (Summer Institute
- for Linguistics)
- + parsers, such as Alveytools (University of Edinburgh)
- + semantic and pragmatic analyzer, such as NLL (University of
- the Saarland, Germany)
- + generation programs, such as FUF (Ben Gurion University of
- the Negev)
- + knowledge representation systems, such as Rhet (University of
- Rochester)
- + multicomponent systems, such as ELU (ISSCO), PENMAN (ISI),
- Pundit (UNISYS), SNePS (SUNY Buffalo),
- + NLP-Tools, such as GULP (University of Georgia) or Linguist
- (Kansai Research Laboratory)
- + applications programs (misc.)
- * If you have developed a piece of software for natural language
- processing that other researchers might find useful, you can
- include it by returning the questionnaire available from the
- sources below.
- * ftp://ftp.dfki.uni-sb.de/pub/registry
- * e-mail: registry@dfki.uni-sb.de
- * post:
- Natural Language Software Registry
- Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
- Stuhlsatzenhausweg 3
- D-66123 Saarbruecken
- Germany
- * Other ftp sites are
- + ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
- + ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
-
- Part of Speech Tagger
- * Description: A rule-based part pf speech tagger developed by
- Eric Brill. For a detailed description of the tagger see chapter 6
- of his thesis.
- * Availability: The tagger and description are available by
- anonymous ftp from
- + ftp://lightning.lcs.mit.edu/pub/BRILL/Programs & Papers
-
- _________________________________________________________________
-
-
-
-
- Andrew Hunt
- ---
- Speech Technology Research Group Ph: 61-2-351 4509
- Dept. of Electrical Engineering Fax: 61-2-351 3847
- University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au
-
-